1. Filter primary dataset to generate proxy-specific subsets

First, import the primary dataset that was generated in Appendix 1.

load("Filtered.trace.toc.full.20220328.RData")

Load required packages.

library(dplyr)
library(deeptime)
library(ggplot2)
library(randomForest)

Update base of the Cryogenian in deeptime.

periods.edit <- deeptime::periods
periods.edit[14,2] <- 720
periods.edit[15,3] <- 720

Categorical variables as factors

It will be useful for the treatment of samples with partial data that categorical geological context variables are factor objects, not character objects. We therefore assign all categorical variables to be factors now.

trace.toc.full$site.type <- factor(trace.toc.full$site.type)
trace.toc.full$metamorphic.bin <- factor(trace.toc.full$metamorphic.bin)
trace.toc.full$basin.type <- factor(trace.toc.full$basin.type)
trace.toc.full$environmental.bin <- factor(trace.toc.full$environmental.bin)
trace.toc.full$lithology.name <- factor(trace.toc.full$lithology.name)

We then filter this primary dataset to generate the specific subdatasets used in our primary analyses of different proxies. We further calculate the number of rows in each of these subset dataframes.

Molybdenum

For our primary Mo random forest analyses, we are interested in samples deposited in anoxic depositional environments that also have iron speciation (specifically Fepy/FeHR) data. We therefore filter the primary dataset to include only samples that are classified as anoxic based upon the iron speciation proxy (samples without iron speciation data that would be classified as anoxic based on Fe/Al will not have the require variables for the primary random forest analyses). We also remove any samples with no Mo, TOC and/or Fepy/FeHR data.

Mo.anox.py.rf <- trace.toc.full %>%
  filter(!is.na(Mo..ppm.) & !is.na(Fe.py.FeHR) & !is.na(TOC..wt..)) %>%
  filter(FeHR.FeT >= 0.38)

nrow(Mo.anox.py.rf)
## [1] 2363

Uranium

For our primary U random forest analyses, we are only interested in samples deposited in anoxic (ferruginous or euxinic) depositional environments. We therefore filter the primary dataset to include only samples that are classified as anoxic based upon iron speciation or Fe/Al ratios. We also remove any samples with no U and/or TOC data.

U.anox.rf <- trace.toc.full %>%
  filter(!is.na(U..ppm.) & !is.na(TOC..wt..)) %>%
  filter(FeHR.FeT >= 0.38 | FeT.Al >= 0.53)

nrow(U.anox.rf)
## [1] 3410

Proportion euxinic

For our primary random forest analyses of the proportion of samples that are euxinic, we are only interested in samples deposited in anoxic (ferruginous or euxinic) depositional environments. Given that we use iron speciation to determine the proportion of euxinic samples (and therefore all samples in this analyses must have full iron speciation data), we therefore filter the primary dataset to include only samples that are classified as anoxic based upon iron speciation data. We also remove samples with no Fepy/FeHR data, as with the other analyses (although the requirement of the filtering step to have FeHR/FeT data should achieve this as no samples should have partial iron speciation data).

Fepy.anox.rf <- trace.toc.full %>%
  filter(!is.na(Fe.py.FeHR)) %>%
  filter(FeHR.FeT >= 0.38)

nrow(Fepy.anox.rf)
## [1] 3922

For the analysis of the proportion of euxinic samples, we also need to code samples based upon whether they are euxinic (based on iron speciation) in a binary fashion, following Sperling et al. (2015, Nature).

Fepy.anox.rf$euxinic.Fe[Fepy.anox.rf$Fe.py.FeHR >= 0.7] <- 1
Fepy.anox.rf$euxinic.Fe[Fepy.anox.rf$Fe.py.FeHR < 0.7] <- 0

Total organic carbon

For our primary random forest analyses of total organic carbon we use no redox filter. We just remove samples with no TOC data.

TOC.all.rf <- trace.toc.full %>%
  filter(!is.na(TOC..wt..))

nrow(TOC.all.rf)
## [1] 12565

2. Treatment of samples with partial data

Age uncertainty

We define a function to assign age uncertainties to samples with missing maximum and minimum age estimates (i.e. only interpreted age). We add ±5Myrs uncertainty on Phanerozoic ages and ±12.5Myrs on Neoproterozoic ages as default.

age.unc <- function(data, Phanerozoic = 5, Proterozoic = 12.5){

  # select which of the vector c(5,12.5) is appropriate by identifying which of the samples without min/max age have an interpreted age greater than 541. If TRUE, adding 1 to "TRUE" (1) gives 2 (i.e. 12.5Myr error from the vector c(5,12.5)). Adding 1 to "FALSE" (0) gives 1 (i.e. 5Myr error from the vector c(5,12.5)).
    data$min.age[is.na(data$min.age)] <- data$interpreted.age[is.na(data$min.age)] - c(5,12.5)[(data$interpreted.age[is.na(data$min.age)] > 541) + 1]

  data$max.age[is.na(data$max.age)] <- data$interpreted.age[is.na(data$max.age)] + c(5,12.5)[(data$interpreted.age[is.na(data$max.age)] > 541) + 1]

  return(data)
}

Confirm that age assignment function works as intended using primary Mo subsdataset as an example. Here we look only at the samples with missing ages by filtering the dataset.

Mo.anox.py.rf.missing.ages <- Mo.anox.py.rf %>%
  filter(is.na(min.age) & is.na(max.age))

Mo.anox.py.rf.missing.ages <- age.unc(data = Mo.anox.py.rf.missing.ages)

Are all Phanerozoic samples that had missing max and min ages in the middle of their assigned age uncertainty?

Generate summary of median of age uncertainty minus interpreted age for newly assigned Phanerozoic samples.

summary(
  rowMeans(
    cbind(Mo.anox.py.rf.missing.ages$max.age[Mo.anox.py.rf.missing.ages$interpreted.age <= 541], Mo.anox.py.rf.missing.ages$min.age[Mo.anox.py.rf.missing.ages$interpreted.age <= 541])) -
    Mo.anox.py.rf.missing.ages$interpreted.age[Mo.anox.py.rf.missing.ages$interpreted.age <= 541]
)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -5.684e-14  0.000e+00  0.000e+00  5.812e-17  0.000e+00  5.684e-14

All values are zero within the precision of R (tiny deviations are floating point errors).

Do all Phanerozoic samples that had missing max and min ages have 10 Myr age uncertainty? Generate summary of age uncertainty for newly assigned Phanerozoic samples.

summary(Mo.anox.py.rf.missing.ages$max.age[Mo.anox.py.rf.missing.ages$interpreted.age <= 541] - Mo.anox.py.rf.missing.ages$min.age[Mo.anox.py.rf.missing.ages$interpreted.age <= 541])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      10      10      10      10      10      10

All values are 10Myrs.

Are all Neoproterozoic samples that had missing max and min ages in the middle of their assigned age uncertainty?

Generate summary of median of age uncertainty minus interpreted age for newly assigned Neoproterozoic samples.

summary(
  rowMeans(
    cbind(Mo.anox.py.rf.missing.ages$max.age[Mo.anox.py.rf.missing.ages$interpreted.age > 541], Mo.anox.py.rf.missing.ages$min.age[Mo.anox.py.rf.missing.ages$interpreted.age > 541])) -
    Mo.anox.py.rf.missing.ages$interpreted.age[Mo.anox.py.rf.missing.ages$interpreted.age > 541]
)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0

All values are zero.

Do all Neoproterozoic samples that had missing max and min ages have 25 Myr age uncertainty? Generate summary of age uncertainty for newly assigned Neoproterozoic samples.

summary(Mo.anox.py.rf.missing.ages$max.age[Mo.anox.py.rf.missing.ages$interpreted.age > 541] - Mo.anox.py.rf.missing.ages$min.age[Mo.anox.py.rf.missing.ages$interpreted.age > 541])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      25      25      25      25      25      25

All values are 25Myrs.

Context data

We define a function to randomly assign values for geologic context variables used in random forest models in cases where a sample is missing data for that variable. This function is then used in the Monte Carlo approach detailed below to maximize the number of samples used and evaluate uncertainty associated with missing data by repeating the random assignments and statistical learning models n times per analysis (n = 100 times in this study).

partial.context <- function(data,
                            site.type,
                            metamorphic.bin,
                            basin.type,
                            site.latitude,
                            site.longitude,
                            environmental.bin,
                            lithology.name){

  # Note that this function currently only randomly generates data from factor levels within the dataset "data" when generating data for categorical variables. Could hard code although all levels seem to be present in most if not all datasets.
    
  # Each context variable must be set to "TRUE" (e.g. site.type = TRUE) inside function in order to be randomly assigned in the function partial.context().

  # levels check - make sure that numerical identification of empty factor name (i.e. missing data) is correct (just in case of changes in SGP data download structure)
  levels.check.total <- 0 # total number of context variables levels are checked for
  levels.check.true <- 0 # number of levels that missing name is correctly identified

  if(site.type == TRUE){
    # levels check (see above)
    levels.check.total <- levels.check.total +1
    if(levels(data$site.type)[1] == ""){
      levels.check.true <- levels.check.true + 1
    }
    # random assignment
    data$site.type[data$site.type == "cuttings"] <- "core" ## Combine core and cuttings
    data$site.type <- factor(data$site.type) ## omit unused factor levels
    data$site.type[data$site.type == ""] <- sample(x = levels(data$site.type)[2:nlevels(data$site.type)], size = nrow(filter(data, site.type == "")), replace=TRUE) ## assign all empty cells one of "core" or "outcrop". Starting at level 2 omits cells with no data from vector to sample.
    data$site.type <- factor(data$site.type) ## omit unused factor levels
  }

  if(metamorphic.bin == TRUE){
    # levels check (see above)
    levels.check.total <- levels.check.total +1
    if(levels(data$metamorphic.bin)[1] == ""){
      levels.check.true <- levels.check.true + 1
    }
    # random assignment
    data$metamorphic.bin[data$metamorphic.bin == ""] <- sample(x = levels(data$metamorphic.bin)[2:nlevels(data$metamorphic.bin)], size = nrow(filter(data, metamorphic.bin == "")), replace=TRUE) ## assign all empty cells one of "Anchizone", "Diagenetic zone" or "Epizone". Starting at level 2 omits cells with no data from vector to sample.
    data$metamorphic.bin <- factor(data$metamorphic.bin) ## omit unused factor levels
  }

   if(basin.type == TRUE){
    # levels check (see above)
    levels.check.total <- levels.check.total +1
    if(levels(data$basin.type)[1] == ""){
      levels.check.true <- levels.check.true + 1
    }
    # random assignment
    data$basin.type[data$basin.type == ""] <- sample(x = levels(data$basin.type)[2:nlevels(data$basin.type)], size = nrow(filter(data, basin.type == "")), replace=TRUE) ## assign all empty cells one of the other basin types in the dataset. Starting at level 2 omits cells with no data from vector to sample.
    data$basin.type <- factor(data$basin.type) ## omit unused factor levels
  }

   if(site.latitude == TRUE){
    data$site.latitude[is.na(data$site.latitude)] <- runif(nrow(filter(data, is.na(site.latitude))), -90, 90) ## assign all empty cells a random latitude from a uniform distribution between -90 and 90 degrees.
   }

   if(site.longitude == TRUE){
    data$site.longitude[is.na(data$site.longitude)] <- runif(nrow(filter(data, is.na(site.longitude))), -180, 180) ## assign all empty cells a random longitude from a uniform distribution between -180 and 180 degrees.
   }

   if(environmental.bin == TRUE){
    # levels check (see above)
    levels.check.total <- levels.check.total +1
    if(levels(data$environmental.bin)[1] == ""){
      levels.check.true <- levels.check.true + 1
    }
    # random assignment
    data$environmental.bin[data$environmental.bin == ""] <- sample(x = levels(data$environmental.bin)[2:nlevels(data$environmental.bin)], size = nrow(filter(data, environmental.bin == "")), replace=TRUE) ## assign all empty cells one of the other environmental bins in the dataset. Starting at level 2 omits cells with no data from vector to sample.
    data$environmental.bin <- factor(data$environmental.bin) ## omit unused factor levels
   }

     if(lithology.name == TRUE){
    # levels check (see above)
    levels.check.total <- levels.check.total +1
    if(levels(data$lithology.name)[1] == ""){
      levels.check.true <- levels.check.true + 1
    }
    # random assignment
    data$lithology.name[data$lithology.name == ""] <- sample(x = levels(data$lithology.name)[2:nlevels(data$lithology.name)], size = nrow(filter(data, lithology.name == "")), replace=TRUE) ## assign all empty cells one of the other lithologies in the dataset. Starting at level 2 omits cells with no data from vector to sample.
    data$lithology.name <- factor(data$lithology.name) ## omit unused factor levels
     }
  # Check that the level with no factor name has been correctly identified (this should always be true but is worth checking, especially if applied to different data). Round is used to accommodate numbers very close to 1 due to possible floating point errors (rounded to 3 decimal places). 
  if(round((levels.check.true/levels.check.total), digits = 3) == 1){
    print("Partial context randomly assigned correctly - missing data identified correctly")
  }else{
    print("ERROR - missing data NOT identified correctly")
  }
  return(data)
}

Test function on subdataset for primary Mo analyses.

Summary before partial.context function.

summary(Mo.anox.py.rf)
##  sample.identifier sample.original.name    site.type    site.latitude   
##  Min.   :    9     Length:2363                  :   0   Min.   :-33.00  
##  1st Qu.: 2850     Class :character     core    : 725   1st Qu.: 28.49  
##  Median : 6790     Mode  :character     cuttings:  48   Median : 54.07  
##  Mean   :18977                          outcrop :1590   Mean   : 46.43  
##  3rd Qu.:13386                                          3rd Qu.: 65.87  
##  Max.   :95921                                          Max.   : 79.97  
##                                                         NA's   :33      
##  site.longitude                  basin.type          metamorphic.bin
##  Min.   :-140.94   rift               :1318                  : 226  
##  1st Qu.:-135.62   passive margin     : 345   Anchizone      :1561  
##  Median : -74.56                      : 193   Diagenetic zone: 518  
##  Mean   : -25.73   intracratonic sag  : 167   Epizone        :  58  
##  3rd Qu.: 103.06   retro-arc foreland : 151                         
##  Max.   : 139.00   peripheral foreland: 114                         
##  NA's   :33        (Other)            :  75                         
##  interpreted.age    max.age         min.age      long.stratigraphy.name
##  Min.   :301.0   Min.   :329.8   Min.   :329.0   Length:2363           
##  1st Qu.:425.9   1st Qu.:410.8   1st Qu.:404.0   Class :character      
##  Median :503.0   Median :433.4   Median :430.5   Mode  :character      
##  Mean   :502.7   Mean   :441.3   Mean   :435.4                         
##  3rd Qu.:551.0   3rd Qu.:481.7   3rd Qu.:480.0                         
##  Max.   :830.0   Max.   :850.0   Max.   :811.5                         
##                  NA's   :1613    NA's   :1616                          
##  stratigraphy.name    environmental.bin       lithology.name    Al..wt..     
##  Length:2363                   :  20    shale        :1743   Min.   : 0.090  
##  Class :character   basinal    :1738    mudstone     : 337   1st Qu.: 3.700  
##  Mode  :character   inner shelf: 179    siltstone    : 139   Median : 5.500  
##                     outer shelf: 426                 : 102   Mean   : 5.741  
##                                         lime mudstone:  36   3rd Qu.: 7.685  
##                                         phosphorite  :   6   Max.   :22.790  
##                                         (Other)      :   0   NA's   :260     
##     Fe..wt..        Mo..ppm.         U..ppm.          FeHR.FeT     
##  Min.   : 0.05   Min.   :  0.00   Min.   :  0.02   Min.   :0.3800  
##  1st Qu.: 1.44   1st Qu.:  2.00   1st Qu.:  3.00   1st Qu.:0.5100  
##  Median : 2.42   Median :  6.46   Median :  5.64   Median :0.6800  
##  Mean   : 2.75   Mean   : 28.31   Mean   : 11.50   Mean   :0.6922  
##  3rd Qu.: 3.76   3rd Qu.: 29.41   3rd Qu.: 13.00   3rd Qu.:0.8300  
##  Max.   :38.50   Max.   :499.00   Max.   :295.29   Max.   :6.5000  
##  NA's   :37                       NA's   :433                      
##    Fe.py.FeHR         FeT.Al         TOC..wt..     
##  Min.   :0.0000   Min.   : 0.040   Min.   : 0.000  
##  1st Qu.:0.2000   1st Qu.: 0.340   1st Qu.: 0.800  
##  Median :0.5400   Median : 0.470   Median : 2.010  
##  Mean   :0.4867   Mean   : 0.573   Mean   : 2.969  
##  3rd Qu.:0.7500   3rd Qu.: 0.630   3rd Qu.: 3.845  
##  Max.   :1.0000   Max.   :34.070   Max.   :31.280  
##                   NA's   :254

Run function, then check summary after partial.context function.

Mo.anox.py.rf.partial.test <- partial.context(data = Mo.anox.py.rf,
                            site.type = TRUE,
                            metamorphic.bin = TRUE,
                            basin.type = TRUE,
                            site.latitude = TRUE,
                            site.longitude = TRUE,
                            environmental.bin = TRUE,
                            lithology.name = TRUE)
## [1] "Partial context randomly assigned correctly - missing data identified correctly"
summary(Mo.anox.py.rf.partial.test)
##  sample.identifier sample.original.name   site.type    site.latitude   
##  Min.   :    9     Length:2363          core   : 773   Min.   :-86.61  
##  1st Qu.: 2850     Class :character     outcrop:1590   1st Qu.: 28.29  
##  Median : 6790     Mode  :character                    Median : 53.97  
##  Mean   :18977                                         Mean   : 45.52  
##  3rd Qu.:13386                                         3rd Qu.: 65.73  
##  Max.   :95921                                         Max.   : 79.97  
##                                                                        
##  site.longitude                  basin.type          metamorphic.bin
##  Min.   :-161.46   rift               :1342   Anchizone      :1629  
##  1st Qu.:-135.62   passive margin     : 381   Diagenetic zone: 595  
##  Median : -73.76   intracratonic sag  : 188   Epizone        : 139  
##  Mean   : -25.35   retro-arc foreland : 168                         
##  3rd Qu.: 103.06   peripheral foreland: 139                         
##  Max.   : 170.18   back-arc           :  93                         
##                    (Other)            :  52                         
##  interpreted.age    max.age         min.age      long.stratigraphy.name
##  Min.   :301.0   Min.   :329.8   Min.   :329.0   Length:2363           
##  1st Qu.:425.9   1st Qu.:410.8   1st Qu.:404.0   Class :character      
##  Median :503.0   Median :433.4   Median :430.5   Mode  :character      
##  Mean   :502.7   Mean   :441.3   Mean   :435.4                         
##  3rd Qu.:551.0   3rd Qu.:481.7   3rd Qu.:480.0                         
##  Max.   :830.0   Max.   :850.0   Max.   :811.5                         
##                  NA's   :1613    NA's   :1616                          
##  stratigraphy.name    environmental.bin       lithology.name    Al..wt..     
##  Length:2363        basinal    :1746    shale        :1758   Min.   : 0.090  
##  Class :character   inner shelf: 184    mudstone     : 344   1st Qu.: 3.700  
##  Mode  :character   outer shelf: 433    siltstone    : 146   Median : 5.500  
##                                         lime mudstone:  41   Mean   : 5.741  
##                                         slate        :  14   3rd Qu.: 7.685  
##                                         oil shale    :  12   Max.   :22.790  
##                                         (Other)      :  48   NA's   :260     
##     Fe..wt..        Mo..ppm.         U..ppm.          FeHR.FeT     
##  Min.   : 0.05   Min.   :  0.00   Min.   :  0.02   Min.   :0.3800  
##  1st Qu.: 1.44   1st Qu.:  2.00   1st Qu.:  3.00   1st Qu.:0.5100  
##  Median : 2.42   Median :  6.46   Median :  5.64   Median :0.6800  
##  Mean   : 2.75   Mean   : 28.31   Mean   : 11.50   Mean   :0.6922  
##  3rd Qu.: 3.76   3rd Qu.: 29.41   3rd Qu.: 13.00   3rd Qu.:0.8300  
##  Max.   :38.50   Max.   :499.00   Max.   :295.29   Max.   :6.5000  
##  NA's   :37                       NA's   :433                      
##    Fe.py.FeHR         FeT.Al         TOC..wt..     
##  Min.   :0.0000   Min.   : 0.040   Min.   : 0.000  
##  1st Qu.:0.2000   1st Qu.: 0.340   1st Qu.: 0.800  
##  Median :0.5400   Median : 0.470   Median : 2.010  
##  Mean   :0.4867   Mean   : 0.573   Mean   : 2.969  
##  3rd Qu.:0.7500   3rd Qu.: 0.630   3rd Qu.: 3.845  
##  Max.   :1.0000   Max.   :34.070   Max.   :31.280  
##                   NA's   :254

Check that different random datasets are being generated each time.

Generate another test dataset and check they are different.

Mo.anox.py.rf.partial.test.2 <- partial.context(data = Mo.anox.py.rf,
                            site.type = TRUE,
                            metamorphic.bin = TRUE,
                            basin.type = TRUE,
                            site.latitude = TRUE,
                            site.longitude = TRUE,
                            environmental.bin = TRUE,
                            lithology.name = TRUE)
## [1] "Partial context randomly assigned correctly - missing data identified correctly"

For site type

summary(Mo.anox.py.rf.partial.test$site.type)
##    core outcrop 
##     773    1590
summary(Mo.anox.py.rf.partial.test.2$site.type)
##    core outcrop 
##     773    1590

For metamorphic bin:

summary(Mo.anox.py.rf.partial.test$metamorphic.bin)
##       Anchizone Diagenetic zone         Epizone 
##            1629             595             139
summary(Mo.anox.py.rf.partial.test.2$metamorphic.bin)
##       Anchizone Diagenetic zone         Epizone 
##            1634             594             135

For basin type:

summary(Mo.anox.py.rf.partial.test$basin.type)
##            back-arc            fore-arc   intracratonic sag      passive margin 
##                  93                  30                 188                 381 
## peripheral foreland  retro-arc foreland                rift              wrench 
##                 139                 168                1342                  22
summary(Mo.anox.py.rf.partial.test.2$basin.type)
##            back-arc            fore-arc   intracratonic sag      passive margin 
##                  98                  34                 180                 368 
## peripheral foreland  retro-arc foreland                rift              wrench 
##                 141                 176                1342                  24

For latitude:

summary(Mo.anox.py.rf.partial.test$site.latitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -86.61   28.29   53.97   45.52   65.73   79.97
summary(Mo.anox.py.rf.partial.test.2$site.latitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -83.15   28.38   53.97   45.88   65.87   80.35

For longitude:

summary(Mo.anox.py.rf.partial.test$site.longitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -161.46 -135.62  -73.76  -25.35  103.06  170.18
summary(Mo.anox.py.rf.partial.test.2$site.longitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -179.53 -135.62  -73.77  -25.40  103.06  178.61

For environmental bin:

summary(Mo.anox.py.rf.partial.test$environmental.bin)
##     basinal inner shelf outer shelf 
##        1746         184         433
summary(Mo.anox.py.rf.partial.test.2$environmental.bin)
##     basinal inner shelf outer shelf 
##        1745         187         431

For lithology:

summary(Mo.anox.py.rf.partial.test$lithology.name)
##     argillite     claystone  dolomudstone lime mudstone metasiltstone 
##             5             5             6            41             6 
##      mudstone     oil shale        pelite   phosphorite         shale 
##           344            12            10             9          1758 
##          silt     siltstone         slate 
##             7           146            14
summary(Mo.anox.py.rf.partial.test.2$lithology.name)
##     argillite     claystone  dolomudstone lime mudstone metasiltstone 
##            11             7             7            45             6 
##      mudstone     oil shale        pelite   phosphorite         shale 
##           341             9             7            16          1749 
##          silt     siltstone         slate 
##             8           150             7

Al imputation

We define a function to estimate the lithology of samples with no Al concentration based upon lithology.

First, we make and save a lookup table including all samples in the main dataset, summarizing Al concentrations by lithology by calculating the median and the 25th and 75th percentiles (the bounds of the interquartile range). We make extra single-row table for all samples with identified lithologies.

Al.lookup <- trace.toc.full %>%
    filter(!(lithology.name == "") & !(is.na(Al..wt..))) %>%
    group_by(lithology.name) %>%
    summarize(median = median(Al..wt.., na.rm = TRUE), perc.25 = quantile(Al..wt.., prob = 0.25, na.rm = T), perc.75 = quantile(Al..wt.., prob = 0.75, na.rm = T))

# make extra single-row table for all samples with identified lithologies
Al.lookup.all <- trace.toc.full %>%
    filter(!(lithology.name == "") & !(is.na(Al..wt..))) %>%
    summarize(lithology.name = "all identified samples", median = median(Al..wt.., na.rm = TRUE), perc.25 = quantile(Al..wt.., prob = 0.25, na.rm = T), perc.75 = quantile(Al..wt.., prob = 0.75, na.rm = T))

save(Al.lookup, Al.lookup.all, file ="Al.lookup_trace.toc.full.RData")

We then define the imputation function. For each sample with identified lithology but missing [Al] data we randomly assign that sample an [Al] value from a uniform distribution with the 25th and 75th percentile [Al] concentrations for that lithology as the minimum and maximum bounds. For samples with no identified lithology that are missing [Al] data, we randomly assign that sample an [Al] value from a uniform distribution with the 25th and 75th percentile [Al] concentrations for all samples as the minimum and maximum bounds. Note that in this dataset there is no Al data for “silt” samples - so we use data for “siltstone” for “silt” (note this only impacts one sample, from the Ediacaran Doushantuo Formation, that is only used in TOC analyses).

Al.impute <- function(data){
  load("Al.lookup_trace.toc.full.RData")

  for(row in 1:nrow(data)){
    if(is.na(data$Al..wt..[row]) == TRUE){
    # If lithology is unidentified, sample from full distribution of identified samples
    if(data$lithology.name[row] == ""){
      data$Al..wt..[row] <- runif(1, Al.lookup.all$perc.25[Al.lookup.all$lithology.name == "all identified samples"], Al.lookup.all$perc.75[Al.lookup.all$lithology.name == "all identified samples"])
    }else if(data$lithology.name[row] == "silt"){
     # There is no Al data for "silt" - use data for "siltstone" for "silt" (note this only impacts one sample, from the Ediacaran Doushantuo Formation, that is only used in TOC analyses)
      data$Al..wt..[row] <- runif(1, Al.lookup$perc.25[Al.lookup$lithology.name == "siltstone"], Al.lookup$perc.75[Al.lookup$lithology.name == "siltstone"])
      }else {
     # Otherwise (except in the case of the two exceptions above), sample from the lithology-specific Al distribution.
      data$Al..wt..[row] <- runif(1, Al.lookup$perc.25[Al.lookup$lithology.name == data$lithology.name[row]], Al.lookup$perc.75[Al.lookup$lithology.name == data$lithology.name[row]])
      }
    }
  }
  return(data)
}

Test the Al imputation method. Are all of the imputed Al values within the interquartile ranges established for their given lithology? Use primary Mo dataset as an example.

# tag samples missing Al with identified lithologies
Mo.anox.py.rf.no.Al.test <- Mo.anox.py.rf

Mo.anox.py.rf.no.Al.test$missing.Al[is.na(Mo.anox.py.rf.no.Al.test$Al..wt..) == TRUE] <- TRUE
Mo.anox.py.rf.no.Al.test$missing.Al[is.na(Mo.anox.py.rf.no.Al.test$Al..wt..) == FALSE] <- FALSE

Mo.anox.py.rf.no.Al.test$missing.lith[Mo.anox.py.rf.no.Al.test$lithology.name == ""] <- TRUE
Mo.anox.py.rf.no.Al.test$missing.lith[Mo.anox.py.rf.no.Al.test$lithology.name != ""] <- FALSE
Mo.anox.py.rf.imputed.Al <- Al.impute(data = Mo.anox.py.rf.no.Al.test)


Mo.anox.py.rf.imputed.Al.no.Al.before <- filter(Mo.anox.py.rf.imputed.Al, (missing.Al == TRUE & missing.lith == FALSE))

Mo.anox.py.rf.imputed.Al.no.Al.before <- merge(Mo.anox.py.rf.imputed.Al.no.Al.before, Al.lookup, id = c(lithology.name), all.x = TRUE)

for(row in 1:nrow(Mo.anox.py.rf.imputed.Al.no.Al.before)){
Mo.anox.py.rf.imputed.Al.no.Al.before$test[row] <- between(Mo.anox.py.rf.imputed.Al.no.Al.before$Al..wt..[row], Mo.anox.py.rf.imputed.Al.no.Al.before$perc.25[row], Mo.anox.py.rf.imputed.Al.no.Al.before$perc.75[row])
}
summary(Mo.anox.py.rf.imputed.Al.no.Al.before$test)
##    Mode    TRUE 
## logical     257

It is true for all samples in this test dataset that all of the imputed [Al] values within the interquartile ranges established for their given lithology.

3. Age model

We define a function to randomly assign an age to each sample from within its prescribed age uncertainty. The age is assigned from a uniform distribution bounded by the minimum and maximum ages. This function is then used in the Monte Carlo approach detailed below to evaluate uncertainty associated with geologic ages by repeating the random assignments and statistical learning models n times per analysis (n = 100 in the analyses of this study). In a few cases (82 samples in trace.toc.full), the sample minimum and maximum assigned ages were flipped by SGP contributors during data entry. We correct for samples like this in the function. In even fewer cases (6 in trace.toc.full, from the same two Cambrian formations as the flipped samples above) there are samples where contributors have given samples the same max, min and interpreted age. Those samples are assigned their interpreted age.

age.model <- function(data){

    # Default method (vast majority of samples)
    data$age.model[data$max.age > data$min.age] <- runif(n = nrow(filter(data, max.age > min.age)), min = data$min.age[data$max.age > data$min.age], max = data$max.age[data$max.age > data$min.age])

    # Samples with flipped max and min (note that max and min are flipped in runif())
    data$age.model[data$min.age > data$max.age] <- runif(n = nrow(filter(data, min.age > max.age)), min = data$max.age[data$min.age > data$max.age], max = data$min.age[data$min.age > data$max.age])
    
    # Samples with completed age uncertainty but max, min and interpreted age the same
    data$age.model[data$min.age == data$max.age] <- data$interpreted.age[data$min.age == data$max.age]
  return(data)
}

Test age model function by confirming that all ages in age model are within the age uncertainty for each sample, using the primary Mo subdataset as an example.

Mo.anox.py.rf.age.model <- Mo.anox.py.rf

Mo.anox.py.rf.age.model <- age.unc(data = Mo.anox.py.rf.age.model)

Mo.anox.py.rf.age.model <- age.model(data = Mo.anox.py.rf.age.model)

within.age.bounds <- as.character()
for(row in 1: nrow(Mo.anox.py.rf.age.model)){
within.age.bounds[row] <- between(Mo.anox.py.rf.age.model$age.model[row], Mo.anox.py.rf.age.model$min.age[row], Mo.anox.py.rf.age.model$max.age[row])
}

print("For how many samples is it true that the age model is within the age uncertainty bounds?")
## [1] "For how many samples is it true that the age model is within the age uncertainty bounds?"
summary(as.factor(within.age.bounds))
## TRUE 
## 2363

4. Monte Carlo random forest function

We define a function to conduct n random forest model analyses (100 in main text of this study), with the aim of 1) generating partial dependence plots for the variable of interest with respect to geologic time when all other specified variables are held constant, 2) generating variable importance analyses of all other specified variables, 3) evaluating model uncertainties associated with partial context data and geologic age uncertainties.

This function calls the functions age.unc(), Al.impute() and partial.context() above if the options to run them are selected in the function itself.

Monte.Carlo.rF <- function(data,
                       resp.var,
                       vars,
                       n,
                       run.age.unc,
                       run.partial.context,
                       run.Al.impute){

  # initiate data frames
  partial.plot.data <-  data.frame(Age=double(), Var=double(), Iteration=double())
  importance.data <-  data.frame(Variable=double(), IncMSE=double(), IncNodePurity=double(), Iteration=double())

  # initiate loop
  for (iteration in 1:n){
  
  # make new data frame for iteration.
  data_it <- data
    
  if(run.age.unc == TRUE){
    data_it <- age.unc(data_it)
  }

  if(run.Al.impute == TRUE){
    data_it <- Al.impute(data_it)
  }

  # Note that Al.impute should be run before partial.context so that samples without assigned lithology have Al imputed based on all identified samples (not their random assignment in this iteration).
  if(run.partial.context == TRUE){
  data_it <- partial.context(data_it,
                  site.type = TRUE,
                  metamorphic.bin = TRUE,
                  basin.type = TRUE,
                  site.latitude = TRUE,
                  site.longitude = TRUE,
                  environmental.bin = TRUE,
                  lithology.name = TRUE)
  }


  data_it <- age.model(data_it)

  # select all of the variables named in the "vars" vector to be columns in the dataframe data.for.rF
  data.for.rF <- select(data_it, all_of(vars))

  # remove any rows that still contain NAs
  data.for.rF <- na.omit(data.for.rF)

  # define a formula that identifies the response variable defined in function and uses all other vars as model variables. 
  formula <- as.formula(paste(resp.var, "~ ."))

  # Random forest function parameter notes
  # ntree - Number of trees to grow. Default is 500 (used here)
  # mtry=3 - Number of variables randomly sampled as candidates at each split. The default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3). p/3 is therefore what is used here. 
  # importance=TRUE - used for generating variable importance plots.
  # proximity=TRUE - used for generating proximity plots (we never do this in this study). 
  # na.action=na.omit - A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)
  # Note that replace = TRUE by default.

  # suggested new equation model code:
  it.rF <- randomForest(formula, data.for.rF, importance=TRUE, na.action=na.omit)

  it.partial.plot <- as.data.frame(partialPlot(it.rF, data.for.rF, age.model))

  it.import <- as.data.frame(importance(it.rF))
  it.import <- tibble::rownames_to_column(it.import, "Variable")

  # NOTE - we do not use rbind() to combine random forest iterations in order to make this function more efficient. This slightly reduces readability but increases efficiency dramatically for high n values. 
  partial.i <- nrow(partial.plot.data)+1
  partial.j <- nrow(partial.plot.data)+nrow(it.partial.plot)
  partial.plot.data[partial.i:partial.j, 1:2] <- it.partial.plot
  partial.plot.data[partial.i:partial.j, 3] <- rep(iteration, nrow(it.partial.plot))

  import.i <- nrow(importance.data)+1
  import.j <- nrow(importance.data)+nrow(it.import)
  importance.data[import.i:import.j, 1:3] <- it.import
  importance.data[import.i:import.j, 4] <- rep(iteration, nrow(it.import))

  }

  # Name variables in variable importance data nicely for plotting etc.
  importance.data$Variable[importance.data$Variable == "age.model"] <- "Age model"
  importance.data$Variable[importance.data$Variable == "site.type"] <- "Site type"
  importance.data$Variable[importance.data$Variable == "metamorphic.bin"] <- "Metamorphic bin"
  importance.data$Variable[importance.data$Variable == "basin.type"] <- "Basin type"
  importance.data$Variable[importance.data$Variable == "site.latitude"] <- "Latitude"
  importance.data$Variable[importance.data$Variable == "site.longitude"] <- "Longitude"
  importance.data$Variable[importance.data$Variable == "lithology.name"] <- "Lithology"
  importance.data$Variable[importance.data$Variable == "environmental.bin"] <- "Environmental bin"
  if(("TOC..wt.." %in% vars == TRUE) & (resp.var != "TOC..wt..")){
  importance.data$Variable[importance.data$Variable == "TOC..wt.."] <- "TOC"
  }
  if(("Fe.py.FeHR" %in% vars == TRUE) & (resp.var != "Fe.py.FeHR")){
  importance.data$Variable[importance.data$Variable == "Fe.py.FeHR"] <- "Fepy/FeHR"
  }
  if(("Al..wt.." %in% vars == TRUE) & (resp.var != "Al..wt..")){
  importance.data$Variable[importance.data$Variable == "Al..wt.."] <- "Al"
  }

  rf.sum <- list(partial.plot.data, importance.data)
  names(rf.sum) <- c("partial.plot.data", "importance.data")
  return(rf.sum)
}

Test Monte Carlo random forest function using primary Mo analyses as an example, with 3 iterations.

test <- Monte.Carlo.rF(data = Mo.anox.py.rf,
                       resp.var = "Mo..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "Mo..ppm.",
                                "TOC..wt..",
                                "Fe.py.FeHR",
                                "Al..wt.."),
                       n = 3,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
## [1] "Partial context randomly assigned correctly - missing data identified correctly"
## [1] "Partial context randomly assigned correctly - missing data identified correctly"
## [1] "Partial context randomly assigned correctly - missing data identified correctly"
summary(test)
##                   Length Class      Mode
## partial.plot.data 3      data.frame list
## importance.data   4      data.frame list
summary(test$partial.plot.data)
##       Age             Var          Iteration
##  Min.   :296.2   Min.   :23.30   Min.   :1  
##  1st Qu.:429.2   1st Qu.:24.03   1st Qu.:1  
##  Median :572.5   Median :25.64   Median :2  
##  Mean   :572.3   Mean   :28.41   Mean   :2  
##  3rd Qu.:714.6   3rd Qu.:29.36   3rd Qu.:3  
##  Max.   :849.7   Max.   :42.95   Max.   :3
summary(test$importance.data)
##    Variable             IncMSE       IncNodePurity       Iteration
##  Length:33          Min.   : 6.995   Min.   :  75024   Min.   :1  
##  Class :character   1st Qu.:11.459   1st Qu.: 115169   1st Qu.:1  
##  Mode  :character   Median :18.465   Median : 450122   Median :2  
##                     Mean   :19.839   Mean   : 581722   Mean   :2  
##                     3rd Qu.:21.462   3rd Qu.: 590741   3rd Qu.:3  
##                     Max.   :56.024   Max.   :2825497   Max.   :3

5. Running primary analyses

Set number of iterations. Default for main text analyses is n = 100.

n <- 100

Molybdenum

For our primary Mo analyses we include all key geologic context variables (“site.type”, “metamorphic.bin”, “basin.type”, “site.latitude”, “site.longitude”, “lithology.name”, “environmental.bin”) as well as geologic age, TOC, Fepy/FeHR and [Al] in our random forest model. We incorporate TOC and Fepy/FeHR to control for the impacts of organic carbon and sulfide availability on Mo reduction rates (note that only anoxic samples are used in these analyses). [Al] is incorporated to control for detrital input.

Note that in all of these random forest function calls figures and results are hidden to prevent extended printouts in the R markdown files, but “fig.show=‘hide’” and “results=‘hide’” can be deleted to see full output.

Mo.anox.py.rf.results <- Monte.Carlo.rF(data = Mo.anox.py.rf,
                       resp.var = "Mo..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "Mo..ppm.",
                                "TOC..wt..",
                                "Fe.py.FeHR",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
Mo.anox.py.rf.partial <- Mo.anox.py.rf.results$partial.plot.data
Mo.anox.py.rf.import <- Mo.anox.py.rf.results$importance.data

Uranium

For our primary U analyses we include all key geologic context variables (“site.type”, “metamorphic.bin”, “basin.type”, “site.latitude”, “site.longitude”, “lithology.name”, “environmental.bin”) as well as geologic age, TOC and [Al] in our random forest model. We incorporate TOC to control for the impacts of organic carbon availability on U reduction rates (note that only anoxic samples are used in these analyses). [Al] is incorporated to control for detrital input.

U.anox.rf.results <- Monte.Carlo.rF(data = U.anox.rf,
                       resp.var = "U..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "U..ppm.",
                                "TOC..wt..",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
U.anox.rf.partial <- U.anox.rf.results$partial.plot.data
U.anox.rf.import <- U.anox.rf.results$importance.data

Proportion euxinic

For our primary proportion euxinic analyses we include all key geologic context variables (“site.type”, “metamorphic.bin”, “basin.type”, “site.latitude”, “site.longitude”, “lithology.name”, “environmental.bin”) as well as geologic age and [Al] in our random forest model. [Al] is incorporated to control for detrital input. Note that only anoxic samples are used in these analyses.

Note that unless warnings are hidden (as they are in this R Markdown file) a warning appears asking if we are sure we want to do a regression because of the binary response variable. In this case we do, following a similar approach in Sperling et al. 2015 (Nature). For completeness, this warning would read: “## Warning in randomForest.default(m, y, …): The response has five or fewer unique values. Are you sure you want to do regression?” if it was not hidden.

Fepy.anox.rf.results <- Monte.Carlo.rF(data = Fepy.anox.rf,
                       resp.var = "euxinic.Fe",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "euxinic.Fe",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
Fepy.anox.rf.partial <- Fepy.anox.rf.results$partial.plot.data
Fepy.anox.rf.import <- Fepy.anox.rf.results$importance.data

Total Organic Carbon

For our primary TOC analyses we include all key geologic context variables (“site.type”, “metamorphic.bin”, “basin.type”, “site.latitude”, “site.longitude”, “lithology.name”, “environmental.bin”) as well as geologic age and [Al] in our random forest model. [Al] is incorporated to control for detrital input.

TOC.all.rf.results <- Monte.Carlo.rF(data = TOC.all.rf,
                       resp.var = "TOC..wt..",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "TOC..wt..",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
TOC.all.rf.partial <- TOC.all.rf.results$partial.plot.data
TOC.all.rf.import <- TOC.all.rf.results$importance.data

6. Function to interpolate partial dependence plot lines

To plot envelopes that summarizing all n (100) random forests, we need to interpolate the lines drawn between points defined in partial dependence plots. We can then evaluate the distribution of partial dependence plot values at each timestep.

interp.pdp <- function(data, age.rounding.factor, n){
  # data - partial dependence plot dataframe
  # age.rounding.factor - number of digits (decimal places) for millions of years
  # n - number of iterations

  # initiate dataframe to store interpolated PDP values
  PDPs.interpolated <-  data.frame(Age=double(), Var=double(), Iteration=double())

  # Before starting, round age splits to nearest x Myrs (if age.rounding factor = 0, 1 Myrs; if age.rounding factor = 1, 0.1 Myrs)
  data$Age.rounded <- round(data$Age, digits=age.rounding.factor)

  # loop through n iterations of Monte Carlo random forest analysis
  for (It in 1:n){
  # create subset for specific iteration
  It.subset <- filter(data, Iteration == It)

  # interpolate values at each timestep
  Interpolated.Age.Var <- approx(x = It.subset$Age.rounded,
                                y = It.subset$Var,
                                xout = seq(min(It.subset$Age.rounded, na.rm = T),
                                           max(It.subset$Age.rounded, na.rm = T),
                                           10^-age.rounding.factor), 
                                method = "linear", 
                                ties = mean) %>%
    as.data.frame() %>%
    setNames(c("Age", "Var")) # only used for visual inspection of this dataframe

  # Add interpolated data to summary dataframe
  i <- nrow(PDPs.interpolated)+1
  j <- nrow(PDPs.interpolated)+nrow(Interpolated.Age.Var)
  PDPs.interpolated[i:j , 1:2] <- Interpolated.Age.Var
  PDPs.interpolated[i:j , 3] <- It
  }

  # Summarize interpolated PDPs
  # convert age to factor and then convert back otherwise end up with occasional weird duplicates using group_by() on a numerical variable. 
  Var.sum <- PDPs.interpolated %>%
  group_by(as.factor(Age)) %>%
  summarize(min = min(Var, na.rm=T),
            perc.05 = quantile(Var, probs=0.05, na.rm=T),
            perc.25 = quantile(Var, probs=0.25, na.rm=T),
            median = median(Var, na.rm=T),
            mean = mean(Var, na.rm=T),
            perc.75 = quantile(Var, probs=0.75, na.rm=T),
            perc.95 = quantile(Var, probs=0.95, na.rm=T),
            max = max(Var, na.rm=T)) %>%
  as.data.frame()
  
  # rename as.factor(Age) column "Age"
  names(Var.sum)[1] <- "Age"
  # make Age numeric again
  Var.sum$Age <- as.numeric(paste(Var.sum$Age))

  return(Var.sum)
}

Test function using primary Mo analyses. Do lines based on the (sparse) partial dependence plot points generated by the random forest function sit within the min-max envelope generated by our function? In this study we use an age rounding factor of 1 (to nearest 0.1Myrs) for the sake of computational efficiency but the patterns are observed at rounding factors of 2 (0.01Myrs) and 0 (1Myrs). Note that really coarse rounding factors (e.g. a factor of -1, or 10Myrs) will not accurately reproduce the partial dependence plots generated by the random forest function.

mo.pdp.test <- interp.pdp(data = Mo.anox.py.rf.partial, age.rounding.factor = 1, n = n)

Mo.test.plot <- ggplot()+
  geom_ribbon(data = mo.pdp.test, aes(x = Age, ymin = min, ymax = max), alpha=.4, fill="darkred", color="grey70", size=.3)+
  geom_path(data = Mo.anox.py.rf.partial, aes(x = Age, y = Var, group = Iteration), alpha = 0.4, size = 0.3, color = "grey10")+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(0,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.3,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

Mo.test.plot

All of the individual lines lie within the envelope of minimum and maximum values defined by our interp.pdp() function.

7. Plotting primary analyses

In these plots we summarize the 100 partial dependence plots run for each analysis as envelopes. The lighter gray envelope describes the maximum and minimum values for each timestep and the darker gray envelope describes the 25th and 75th percentiles.

Molybdenum

mo.pdp <- interp.pdp(data = Mo.anox.py.rf.partial, age.rounding.factor = 1, n = n)

Mo.plot <- ggplot()+
  geom_ribbon(data = mo.pdp, aes(x = Age, ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(data = mo.pdp, aes(x = Age, ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(0,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.3,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

Mo.plot

Uranium

u.pdp <- interp.pdp(data = U.anox.rf.partial, age.rounding.factor = 1, n = n)

U.plot <- ggplot()+
  geom_ribbon(data = u.pdp, aes(x = Age, ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(data = u.pdp, aes(x = Age, ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(0,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.3,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot

Proportion euxinic

Note that we hide warnings here because of random forest warning about using binary variable in a regression (here we do want to do a regression to replicate the broad approach of Sperling et al. 2015, Nature - see similar comments above).

prop_eux.pdp <- interp.pdp(data = Fepy.anox.rf.partial, age.rounding.factor = 1, n = n)

prop_eux.plot <- ggplot()+
  geom_ribbon(data = prop_eux.pdp, aes(x = Age, ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(data = prop_eux.pdp, aes(x = Age, ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.01,1.03),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.3,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot

Total Organic Carbon

TOC.pdp <- interp.pdp(data = TOC.all.rf.partial, age.rounding.factor = 1, n = n)

TOC.plot <- ggplot()+
  geom_ribbon(data = TOC.pdp, aes(x = Age, ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(data = TOC.pdp, aes(x = Age, ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(0,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt %)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.3,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot

Combine to generate primary summary plot

In these plots we have delineated the three main phases of marine biogeochemistry between 1000 and 300Ma - with transitions around the base of of the Cambrian and Devonian periods.

Mo.plot.for.sum <- ggplot(mo.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,62),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot.for.sum <- ggplot(u.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot.for.sum <- ggplot(prop_eux.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot.for.sum <- ggplot(TOC.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.plot <- ggarrange2(Mo.plot.for.sum, prop_eux.plot.for.sum, U.plot.for.sum, TOC.plot.for.sum, ncol=2, heights=c(1,1))

ggsave("Figure 2 Partial Dependence Plot with shading.pdf", summary.plot, height=14, width=26)

8. Expanded analyses varying redox-sensitive predictors

Anoxic Mo (no pyrite)

Here, we generate an anoxic Mo equivalent to our primary U dataset (not restricting to anoxic samples with iron speciation). Fepy/FeHR is not included as a predictor variable.

Mo.anox.rf <- trace.toc.full %>%
  filter(!is.na(Mo..ppm.)) %>%
  filter(FeHR.FeT >= 0.38 | FeT.Al >= 0.53)

nrow(Mo.anox.rf)

Mo.anox.rf.results <- Monte.Carlo.rF(data = Mo.anox.rf,
                       resp.var = "Mo..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "Mo..ppm.",
                                "TOC..wt..",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
Mo.anox.rf.partial <- Mo.anox.rf.results$partial.plot.data
Mo.anox.rf.import <- Mo.anox.rf.results$importance.data

mo.no_py.pdp <- interp.pdp(data = Mo.anox.rf.partial, age.rounding.factor = 1, n = n)

We combine these analyses with our primary Mo analyses for plotting.

mo.pdp$treatment <- "Anoxic samples + TOC + Fepy/FeHR"
mo.no_py.pdp$treatment <- "Anoxic samples + TOC"

mo.pdp.sum <- rbind(mo.pdp,
                    mo.no_py.pdp)

Anoxic U with pyrite

Here, we generate an anoxic U equivalent to our primary Mo dataset (restricting to anoxic samples with iron speciation). Fepy/FeHR is included as a predictor variable.

U.anox.py.rf <- trace.toc.full %>%
  filter(!is.na(U..ppm.) & !is.na(Fe.py.FeHR)) %>%
  filter(FeHR.FeT >= 0.38)

nrow(U.anox.py.rf)

U.anox.py.rf.results <- Monte.Carlo.rF(data = U.anox.py.rf,
                       resp.var = "U..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "U..ppm.",
                                "TOC..wt..",
                                "Fe.py.FeHR",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
U.anox.py.rf.partial <- U.anox.py.rf.results$partial.plot.data
U.anox.py.rf.import <- U.anox.py.rf.results$importance.data

u.w_py.pdp <- interp.pdp(data = U.anox.py.rf.partial, age.rounding.factor = 1, n = n)

We combine these analyses with our primary U analyses for plotting.

u.pdp$treatment <- "Anoxic samples + TOC"
u.w_py.pdp$treatment <- "Anoxic samples + TOC + Fepy/FeHR"

u.pdp.sum <- rbind(u.pdp,
                    u.w_py.pdp)

Proportion euxinic with TOC

Note that we hide warnings here because of random forest warning about using binary variable in a regression (here we do want to do a regression to replicate the broad approach of Sperling et al. 2015, Nature).

Fepy.anox.w_TOC.rf <- trace.toc.full %>%
  filter(!is.na(Fe.py.FeHR) & !is.na(TOC..wt..)) %>%
  filter(FeHR.FeT >= 0.38)

nrow(Fepy.anox.w_TOC.rf)

# For the analysis of the proportion of euxinic samples, we also need to code samples based # upon whether they are euxinic (based on iron speciation) in a binary fashion.

Fepy.anox.w_TOC.rf$euxinic.Fe[Fepy.anox.w_TOC.rf$Fe.py.FeHR >= 0.7] <- 1
Fepy.anox.w_TOC.rf$euxinic.Fe[Fepy.anox.w_TOC.rf$Fe.py.FeHR < 0.7] <- 0

Fepy.anox.w_TOC.rf.results <- Monte.Carlo.rF(data = Fepy.anox.w_TOC.rf,
                       resp.var = "euxinic.Fe",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "euxinic.Fe",
                                "Al..wt..",
                                "TOC..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
Fepy.anox.w_TOC.rf.partial <- Fepy.anox.w_TOC.rf.results$partial.plot.data
Fepy.anox.w_TOC.rf.import <- Fepy.anox.w_TOC.rf.results$importance.data

prop_eux.w_TOC.pdp <- interp.pdp(data = Fepy.anox.w_TOC.rf.partial, age.rounding.factor = 1, n = n)

We combine these analyses with our primary proportion euxinic analyses for plotting.

prop_eux.pdp$treatment <- "Anoxic samples"
prop_eux.w_TOC.pdp$treatment <- "Anoxic samples + TOC"

prop_eux.pdp.sum <- rbind(prop_eux.pdp,
                    prop_eux.w_TOC.pdp)

Anoxic TOC (no pyrite)

TOC.anox.rf <- trace.toc.full %>%
  filter(!is.na(TOC..wt..))  %>%
  filter(FeHR.FeT >= 0.38 | FeT.Al >= 0.53)

nrow(TOC.anox.rf)

TOC.anox.rf.results <- Monte.Carlo.rF(data = TOC.anox.rf,
                       resp.var = "TOC..wt..",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "TOC..wt..",
                                "Al..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
TOC.anox.rf.partial <- TOC.anox.rf.results$partial.plot.data
TOC.anox.rf.import <- TOC.anox.rf.results$importance.data

TOC.anox.pdp <- interp.pdp(data = TOC.anox.rf.partial, age.rounding.factor = 1, n = n)

Anoxic TOC with pyrite

TOC.anox.py.rf <- trace.toc.full %>%
  filter(!is.na(TOC..wt..) & !is.na(Fe.py.FeHR))  %>%
  filter(FeHR.FeT >= 0.38)

nrow(TOC.anox.rf)

TOC.anox.py.rf.results <- Monte.Carlo.rF(data = TOC.anox.py.rf,
                       resp.var = "TOC..wt..",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "TOC..wt..",
                                "Al..wt..",
                                "Fe.py.FeHR"),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = TRUE)
TOC.anox.py.rf.partial <- TOC.anox.py.rf.results$partial.plot.data
TOC.anox.py.rf.import <- TOC.anox.py.rf.results$importance.data

TOC.anox.py.pdp <- interp.pdp(data = TOC.anox.py.rf.partial, age.rounding.factor = 1, n = n)

We combine these analyses with our primary TOC analyses for plotting.

TOC.pdp$treatment <- "All samples"
TOC.anox.pdp$treatment <- "Anoxic samples"
TOC.anox.py.pdp$treatment <- "Anoxic samples + Fepy/FeHR"

TOC.pdp.sum <- rbind(TOC.pdp,
                     TOC.anox.pdp,
                     TOC.anox.py.pdp)

Summary plotting of analyses including varying redox treatments

In these summary plots we just plot the interquartile ranges for our analyses to ease the comparison of trends in central tendancy.

Mo.plot.for.redox.var.sum <- ggplot(mo.pdp.sum, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75, fill=treatment, color=treatment), alpha=.7, size=.5)+
  scale_fill_manual(values = c(rgb(127, 160, 86, maxColorValue = 255),
                               rgb(179, 224, 149, maxColorValue = 255)),
                    name = "Treatment")+
  scale_color_manual(values = c(rgb(127, 160, 86, maxColorValue = 255),
                               rgb(179, 224, 149, maxColorValue = 255)),
                    name = "Treatment")+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,82),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,254,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot.for.redox.var.sum <- ggplot(u.pdp.sum, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75, fill=treatment, color=treatment), alpha=.7, size=.5)+
  scale_fill_manual(values = c(rgb(103, 165, 153, maxColorValue = 255),
                               rgb(191, 208, 186, maxColorValue = 255)),
                    name = "Treatment")+
  scale_color_manual(values = c(rgb(103, 165, 153, maxColorValue = 255),
                               rgb(191, 208, 186, maxColorValue = 255)),
                    name = "Treatment")+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot.for.redox.var.sum <- ggplot(prop_eux.pdp.sum, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75, fill=treatment, color=treatment), alpha=.7, size=.5)+
  scale_fill_manual(values = c(rgb(240, 64, 40, maxColorValue = 255),
                               rgb(251, 154, 133, maxColorValue = 255)),
                    name = "Treatment")+
  scale_color_manual(values = c(rgb(240, 64, 40, maxColorValue = 255),
                               rgb(251, 154, 133, maxColorValue = 255)),
                    name = "Treatment")+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot.for.redox.var.sum <- ggplot(TOC.pdp.sum, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75, fill=treatment, color=treatment), alpha=.7, size=.5)+
  scale_fill_manual(values = c(rgb(203, 140, 55, maxColorValue = 255),
                               rgb(254, 179, 66, maxColorValue = 255),
                               rgb(254, 217, 106, maxColorValue = 255)),
                    name = "Treatment")+
  scale_color_manual(values = c(rgb(203, 140, 55, maxColorValue = 255),
                               rgb(254, 179, 66, maxColorValue = 255),
                               rgb(254, 217, 106, maxColorValue = 255)),
                    name = "Treatment")+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.redox.var.plot <- ggarrange2(Mo.plot.for.redox.var.sum, prop_eux.plot.for.redox.var.sum, U.plot.for.redox.var.sum, TOC.plot.for.redox.var.sum, ncol=2, heights=c(1,1))

ggsave("Figure Sx Partial Dependence Plot (varying redox treatments).pdf", summary.redox.var.plot, height=14, width=26)

9. Primary analyses without Al

We also re-run our analyses without incorporating [Al] as a broad proxy for detrital input.

Molybdenum

Mo.anox.py.no_Al.rf.results <- Monte.Carlo.rF(data = Mo.anox.py.rf,
                       resp.var = "Mo..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "Mo..ppm.",
                                "TOC..wt..",
                                "Fe.py.FeHR"),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = FALSE)
Mo.anox.py.no_Al.rf.partial <- Mo.anox.py.no_Al.rf.results$partial.plot.data
Mo.anox.py.no_Al.rf.import <- Mo.anox.py.no_Al.rf.results$importance.data

mo.no_Al.pdp <- interp.pdp(data = Mo.anox.py.no_Al.rf.partial, age.rounding.factor = 1, n = n)

Uranium

U.anox.no_Al.rf.results <- Monte.Carlo.rF(data = U.anox.rf,
                       resp.var = "U..ppm.",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "U..ppm.",
                                "TOC..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = FALSE)
U.anox.no_Al.rf.partial <- U.anox.no_Al.rf.results$partial.plot.data
U.anox.no_Al.rf.import <- U.anox.no_Al.rf.results$importance.data

u.no_Al.pdp <- interp.pdp(data = U.anox.no_Al.rf.partial, age.rounding.factor = 1, n = n)

Proportion euxinic

Note that we hide warnings here because of random forest warning about using binary variable in a regression (here we do want to do a regression to replicate the broad approach of Sperling et al. 2015, Nature).

Fepy.anox.no_Al.rf.results <- Monte.Carlo.rF(data = Fepy.anox.rf,
                       resp.var = "euxinic.Fe",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "euxinic.Fe"),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = FALSE)
Fepy.anox.no_Al.rf.partial <- Fepy.anox.no_Al.rf.results$partial.plot.data
Fepy.anox.no_Al.rf.import <- Fepy.anox.no_Al.rf.results$importance.data

prop_eux.no_Al.pdp <- interp.pdp(data = Fepy.anox.no_Al.rf.partial, age.rounding.factor = 1, n = n)

Total Organic Carbon

TOC.all.no_Al.rf.results <- Monte.Carlo.rF(data = TOC.all.rf,
                       resp.var = "TOC..wt..",
                       vars = c("age.model",
                                "site.type",
                                "metamorphic.bin",
                                "basin.type",
                                "site.latitude",
                                "site.longitude",
                                "lithology.name",
                                "environmental.bin",
                                "TOC..wt.."),
                       n = n,
                       run.age.unc = TRUE,
                       run.partial.context = TRUE,
                       run.Al.impute = FALSE)
TOC.all.no_Al.rf.partial <- TOC.all.no_Al.rf.results$partial.plot.data
TOC.all.no_Al.rf.import <- TOC.all.no_Al.rf.results$importance.data

TOC.no_Al.pdp <- interp.pdp(data = TOC.all.no_Al.rf.partial, age.rounding.factor = 1, n = n)

Combine to generate summary plot

Mo.no_Al.plot.for.sum <- ggplot(mo.no_Al.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,62),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.no_Al.plot.for.sum <- ggplot(u.no_Al.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.no_Al.plot.for.sum <- ggplot(prop_eux.no_Al.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.no_Al.plot.for.sum <- ggplot(TOC.no_Al.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.plot <- ggarrange2(Mo.no_Al.plot.for.sum, prop_eux.no_Al.plot.for.sum, U.no_Al.plot.for.sum, TOC.no_Al.plot.for.sum, ncol=2, heights=c(1,1))

ggsave("Figure Sx Partial Dependence Plot (no Al) with shading.pdf", summary.plot, height=14, width=26)

10. Variable importance plots

We will summarize the variable importance of the predictor variables in our primary Monte Carlo random forest analyses, using box and whisker plots to illustrate the distributions generated by our Monte Carlo approach.

Mo.MSE.plot <- ggplot(Mo.anox.py.rf.import, aes(x = Variable, y = IncMSE))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased Mean\nSquared Error (%)")+
  ggtitle("Molybdenum")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
Mo.Node.plot <- ggplot(Mo.anox.py.rf.import, aes(x = Variable, y = IncNodePurity))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased\nNode Purity")+
  ggtitle("Molybdenum")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
U.MSE.plot <- ggplot(U.anox.rf.import, aes(x = Variable, y = IncMSE))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased Mean\nSquared Error (%)")+
  ggtitle("Uranium")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
U.Node.plot <- ggplot(U.anox.rf.import, aes(x = Variable, y = IncNodePurity))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased\nNode Purity")+
  ggtitle("Uranium")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
prop_eux.MSE.plot <- ggplot(Fepy.anox.rf.import, aes(x = Variable, y = IncMSE))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased Mean\nSquared Error (%)")+
  ggtitle("Proportion Euxinic")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
prop_eux.Node.plot <- ggplot(Fepy.anox.rf.import, aes(x = Variable, y = IncNodePurity))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased\nNode Purity")+
  ggtitle("Proportion Euxinic")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
TOC.MSE.plot <- ggplot(TOC.all.rf.import, aes(x = Variable, y = IncMSE))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased Mean\nSquared Error (%)")+
  ggtitle("Total Organic Carbon")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
TOC.Node.plot <- ggplot(TOC.all.rf.import, aes(x = Variable, y = IncNodePurity))+stat_boxplot(geom ='errorbar', width = 0.6, size=0.6, color="grey20", lwd=.1)+ geom_boxplot(outlier.alpha = .5, outlier.shape=16, outlier.size=2, outlier.color="grey20", color="grey20", lwd=0.6, fill="grey80", fatten = 1.3)+coord_flip()+
  theme_bw()+
  ylab("Increased\nNode Purity")+
  ggtitle("Total Organic Carbon")+
  theme(panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1.1),
        axis.title = element_text(size=30),
        axis.text =  element_text(size=24, colour="black"),
        plot.title = element_text(size=30, face = "bold"),
        plot.margin = ggplot2::margin(10,30,10,10), 
        legend.position="none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(margin = ggplot2::margin(t = 10, r = 0, b = 0, l = 0)),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())
## Warning: Duplicated aesthetics after name standardisation: size
Imp.sum <- ggarrange2(Mo.MSE.plot, 
           Mo.Node.plot, 
           U.MSE.plot, 
           U.Node.plot, 
           prop_eux.MSE.plot, 
           prop_eux.Node.plot, 
           TOC.MSE.plot, 
           TOC.Node.plot, 
           ncol=2)

ggsave("Figure Sx Variable importance plots for Monte Carlo random forest.pdf", Imp.sum, height=27, width=17)

11. Appendix - Three phase plot evolution for talks

No shading.

Mo.plot.for.sum.no.shade <- ggplot(mo.pdp, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,62),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot.for.sum.no.shade <- ggplot(u.pdp, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot.for.sum.no.shade <- ggplot(prop_eux.pdp, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot.for.sum.no.shade <- ggplot(TOC.pdp, aes(x=Age))+
  #annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.plot.no.shade <- ggarrange2(Mo.plot.for.sum.no.shade, prop_eux.plot.for.sum.no.shade, U.plot.for.sum.no.shade, TOC.plot.for.sum.no.shade, ncol=2, heights=c(1,1))

ggsave("Figure 2 Partial Dependence Plot no shading.pdf", summary.plot, height=14, width=26)

Neoproterozoic shading.

Mo.plot.for.sum.shade.1 <- ggplot(mo.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,62),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot.for.sum.shade.1 <- ggplot(u.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot.for.sum.shade.1 <- ggplot(prop_eux.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot.for.sum.shade.1 <- ggplot(TOC.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  #annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.plot.shade.1<- ggarrange2(Mo.plot.for.sum.shade.1, prop_eux.plot.for.sum.shade.1, U.plot.for.sum.shade.1, TOC.plot.for.sum.shade.1, ncol=2, heights=c(1,1))

ggsave("Figure 2 Partial Dependence Plot with shading part 1.pdf", summary.plot.shade.1, height=14, width=26)

Neoproterozoic and early Paleozoic shading.

Mo.plot.for.sum.shade.2 <- ggplot(mo.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.4,62),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Mo (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

U.plot.for.sum.shade.2 <- ggplot(u.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-.4,62),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("U (ppm)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

prop_eux.plot.for.sum.shade.2 <- ggplot(prop_eux.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_cartesian(xlim=rev(c(298.9,1000)), ylim=c(-.01,1.03),expand=FALSE)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("Proportion euxinic")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         legend.justification = c(0, 0),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

TOC.plot.for.sum.shade.2 <- ggplot(TOC.pdp, aes(x=Age))+
  annotate(geom="rect", xmax=Inf, xmin=541, ymin=-Inf, ymax=Inf, fill="#C9DEE8", alpha=0.2)+
  annotate(geom="rect", xmax=541, xmin=418, ymin=-Inf, ymax=Inf, fill="#9DC2D6", alpha=0.4)+
  #annotate(geom="rect", xmax=418, xmin=-Inf, ymin=-Inf, ymax=Inf, fill="#619FBB", alpha=0.4)+
  geom_ribbon(aes(ymin = min, ymax = max), alpha=1, fill="grey85", color="grey70", size=.3)+
  geom_ribbon(aes(ymin = perc.25, ymax = perc.75), alpha=1, fill="grey60", color="grey50", size=.5)+
  theme_bw()+
  coord_geo(xlim=rev(c(298.9,1002)), expand=FALSE, ylim=c(-0.04,8.3),
            pos = as.list(rep("bottom", 1)),
            abbrv=list(TRUE),
            dat = list(periods.edit),
            height = list(unit(2, "lines")),
            bord=list(c("left", "bottom", "right")), lwd=as.list(c(1)), size=8)+
  scale_x_reverse(breaks=c(300,400,500,600,700,800,900,1000))+
  ylab("TOC (wt%)")+xlab("Time (Ma)")+
  theme(plot.margin = ggplot2::margin(.1,1,.3,1,"cm"),panel.border = element_rect(fill=NA,color="black", size=2,linetype="solid"),
        axis.ticks = element_line(size=1),
        axis.line = element_line(lineend = 'square'),
        axis.title = element_text(size=34),
        axis.text = element_text( size=26, color="black"),
        legend.title = element_text(size=20),
        legend.text = element_text( size=16),
        axis.ticks.length = unit(5, "points"),
        legend.position="top",         
        legend.justification = c(0, 0),
        axis.title.y = element_text(vjust=3),
        panel.grid.major = element_blank(),panel.grid.minor = element_blank())

summary.plot.shade.2 <- ggarrange2(Mo.plot.for.sum.shade.2, prop_eux.plot.for.sum.shade.2, U.plot.for.sum.shade.2, TOC.plot.for.sum.shade.2, ncol=2, heights=c(1,1))

ggsave("Figure 2 Partial Dependence Plot with shading part 2.pdf", summary.plot.shade.2, height=14, width=26)